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Abstract. This paper, the first step to connect relational databases 
with systems consequence (Kent [5]), is concerned with the semantics 
O j of relational databases. It aims to to study system consequence in the 

D " logical/semantic system of relational databases. The paper, which was 

_ inspired by and which extends a recent set of papers on the theory of 

relational database systems (Spivak [6] [7] ) , is linked with work on the 
Information Flow Framework (IFF [9]) connected with the ontology stan- 
dards effort (SUO), since relational databases naturally embed into first 
order logic. The database semantics discussed here is concerned with the 
conceptual level of database architecture. We offer both an intuitive and 
technical discussion. Corresponding to the notions of primary and foreign 
keys, relational database semantics takes two forms: a distinguished form 
where entities are distinguished from relations, and a unified form where 
relations and entities coincide. The distinguished form corresponds to the 
theory presented in (Spivak [6]). The unified form, a special case of the 
distinguished form, corresponds to the theory presented in (Spivak [7]). 
A later paper will discuss various formalisms of relational databases, 
such as relational algebra and first order logic, and will complete the 
description of the relational database logical environment. 
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1 Introduction 

The author's "Systems Consequence" paper (Kent [S]) is a very general theory 
and methodology for specification and inter-operation of systems of informa- 
tion resources. The generality comes from the fact that it is independent of the 
logical/semantic system (institution) being used. This is a wide-ranging theory, 
based upon ideas from information flow (Barwise and Seligman |T] ) , formal con- 
cept analysis (Wille and Ganter et al [2]), the theory of institutions (Goguen 
et al 0), and the lattice of theories notion (Sowa), for the integration of both 
formal and semantic systems independent of logical environment. In order to 
better understand the motivations of that paper and to be able more readily 
to apply its concepts, in the future it will be important to study system conse- 
quence in various particular logical/semantic systems. This paper aims to do just 
that for the logical/semantic system of relational databases. The paper, which 
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was inspired by and which extends a recent set of papers on the theory of rela- 
tional database systems (Spivak [6], [7]), is linked with work on the Information 
Flow Framework (IFF [9]) connected with the ontology standards effort (SUO), 
since relational databases naturally embed into first order logic. We offer both 
an intuitive and technical discussion. Corresponding to the notions of primary 
and foreign keys, relational database semantics takes two forms: a distinguished 
form where entities are distinguished from relations, and a unified form where 
relations and entities coincide. The distinguished form corresponds to the the- 
ory presented in the paper (Spivak [5]). We extend Spivak's treatment of tables 
from the static case of a single entity classification (type specification) to the 
dynamic case of classifications varying along infomorphisms. Our treatment of 
relational databases as diagrams of tables differs from Spivak's sheaf theory of 
databases. The unified form, a special case of the distinguished form, corre- 
sponds to the theory presented in the paper (Spivak [7])). The unified form has 
a graphical presentation, which corresponds to the sketch theory of databases 
(Johnson and Rosebrugh [4]) and the resource description framework (RDF). 
This paper, which is the first step to connect relational databases with systems 
consequence, is concerned with the semantics of relational databases. A later 
paper will discuss various formalisms of relational databases, such as relational 
algebra and first order logic. Section [5] discusses the relational data model. Sec- 
tion[3]describes our representation for the table concept, both defining a category 
of tables, and proving that this category is complete (joins exist) and cocomplete 
(unions exist). Section 0] represents the relational database concept as a diagram 
of tables linked by the generalization-specialization of projections. Morphisms 
of relational databases are defined. Canonical examples of both are discussed. 
Finally, section [5] summarizes the results and gives some concluding remarks. 

2 Relational Data Model 

The paper defines an architectural semantics for the relational data model. 0A11 
information in the relational model is represented within relations. A relational 
database is a collection of relations (relational tables, or just tables). A table is 
represented as an array, organized into rows and columns. The rows are called 
the tuples (records) of the table, whereas the columns are called the attributes of 
the table. Both rows (tuples) and columns are unordered. In the basic relational 
data model all the components can be resolved into sets and functions. H 

The basic relational building block is the data domain represented by an 
entity type x € X, where X is the type set of an entity classification £ = 
(X, Y, \=s), whose instance set is a universe of data values Y local to the database. 
An entity instance y £ Y is classified by an entity type x € X when y \=£ x. 
Within the classification £ the entity type x e X represents its extent, which is 

1 Older architectures of data include the hierarchical model and network model. Of 
these, nothing will be said. A newer architecture of data, called the object-relation- 
object model, is a presentation form for the relational data model described here. 

2 The basic relational data model is defined on the category Set of sets and functions. 
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the domain of data values ext £ (x) = {y E Y \ y \=s x}. We extend the classi- 
fication to generalized elements. An indexed collection of entity types {(i,Si) 
i E I, Si E X} is called an ^-signature. It is denoted by the pair (I, s) and repre- 
sented as a map / Al from index set to entity type set. An indexed collection 
of entity instances {(j, tj) j E J, tj E Y} is called an £-tuple. A tuple represents 
an object; either a concrete, physical object or an abstract, conceptual object. 
It is denoted by the pair (J, t) and represented as a map J \Y from index set 
to the universe. The indexing set is called the arity of the signature or tuple. A 
£ -tuple (J, t) is classified by an £ -signature (I,s), denote by t \= £ s, when they 
have the same arity J — I and enjoy pointwise classification U \= £ Si for all i E I. 
The extent of an £ -signature (I, s) is its tuple set tup £ (I, s) = {t \ t ^ £ s}. 

Let T be a relational table in a database based on the entity classification 
£ . An attribute of T is an ordered pair (i, Si) consisting of an attribute name 
i E I and an entity type Si E A, where / is the arity of the table. The collection 
of attributes of T forms its schema (I, s, X), where (I,s) is an ^-signature. A 
tuple of T is an £-tuple that is classified by the table signature (I,s). Hence, 
the tuple set of T is the set tup £ (I,s). Each tuple of T must be uniquely 
identifiable by some combination (one or more) of its attribute values. This 
combination is referred to as the primary key. Without loss of generality, we 
assume that (primary) keys are single attributes. In addition, we conceptually 
separate the primary key attribute from the rest of the table and use it for 
indexing. Hence, the table T is an indexed collection of £ -tuples T = {(k,Tk) | 
Tfc G tup £ (I,s),k E K}, where K is the set of primary keys of the table; that 
is, the table is represented as a map K A tup £ (I, s) from keys to tuples. 

Here is an small example of a relational database for a company in unified 
form, which illustrates both primary keys (A) and foreign keys (A). It contains 
two relational tables, an employee table Emp and a department table Dept, 
which are indexed by primary keys and linked by foreign keys. 



cmp:Emp 


name:Str 


addr-.Str 


dept: Dept 


cl 


Plato 




dl 


c2 




Italy 


dl 


c3 


Doc art es 


France 


dl 



dept : Dept 


name: Str 


mngr: Emp 


dl 




e'3 


d2 


Production 


e'2 



In this example, the entity (relation) types are Dept, Emp and Str. In the 
employee relational table Emp, the arity is {name, addr, dept}, the signature is 
{(name, Str), (addr, Str), (dept, Dept)}, and the (primary) key set is {el, e2, e3}. 
Dotted items indicate relations (types or instances) being used as entities, since 
this is in unified form. 

In the relational data model, there are three inherent integrity constraints: 
entity integrity, domain integrity, and referential integrity. Entity integrity as- 
serts that every table must have a primary key column in which each entry 
identifies its own row (tuple). Domain integrity asserts that each data entry in 
a column must be of the type of that column. Entity and domain integrity are 
requirements for the distinguished form of database semantics. Entity integrity 
says there must be a tuple function from the set of (primary) keys, and domain 
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integrity says that image tuples must be classified by the table signature (J, s). 
Hence, entity and domain integrity assert the existence of the tuple or content 
function t : K —> tup £ (I,s). Referential integrity asserts that each entry in a 
foreign key column of a referencing table must occur in the primary key column 
of the referenced table. Referential integrity is a requirement for the unified form 
of database semantics. Referential integrity says there must be a function from 
a foreign key column of a referencing table to the primary key column of the ref- 
erenced table. Hence, referential integrity asserts the existence of the functions 
in the sketch interpretation of a relational database. 

The information in a database is accessed by specifying queries, which use 
operations such as select to identify tuples, project to identify attributes, and join 
to combine tables. In this paper, projection refers to a primitive generalization- 
specialization operation between pairs of relational tables (they are specified 
by the database schema, project from joined table to components, or other), 
whereas join is a composite operation on a linked collection of tables. Selection 
is a special case of join, which uses reference relations (tables). 

3 Tables 

A table (database relation) T = (S,£,K,t) has an underlying (simple) schema 
S = (A, I, s) with a set of entity types A and an A-signature (I, s) G (SetJAT ), 
an entity classification £ = (X,Y, with a common (entity) type set com- 
ponent X G Set and a local universe of entity instances Y G Set, a set K of 
(primary) keys, and a tuple function t : K — > tup £ (I,s) mapping keys to £- 
tuples of type (signature) {I,s). Equivalently, it is an object in the the comma 
category of £ -tables T G (S&t\.tup £ ) . 

A table morphism (morphism of database relations) (h,f,g,k) : 71 = (Si, £i, Ki, ti) 
(52, £2, K2, tz) = T2 consists of a (simple) schema morphism (h, f) : S2 = 
(X2, 121S2) — > (Xi,lx,si) = Si with a function on entity types / : A 2 — » Xi 
and an Ai-signature morphism h : X!/(i2) ^2) = (l2,S2-f) — > (Ji,si), an entity 
infomorphism (f,g) : £2 = (A2, Y2, \=s 2 ) (Ai, Yi, |=£ x ) = £1 with a common 
(entity) type function component / : A2 — > Xi and a universe (entity instance) 
function g : X\ — > X2, and a key function k : Ki — >• K%, which satisfy the 
condition k-t% = t\ ■ tupih, f,g), where tup(h, /, 0) = tup £l (h) ■ T{/, s )(l2) S2) = 
(h-(-)) ■ ((-)-g) ■ tup £i (Ii, si) — > tup £2 (I 2 , S2). □ Table morphisms are illus- 
trated in Figure [TJ Here we see that table morphisms have the pleasing property 
that corresponding entries in the source and target tables satisfy the infomor- 
phism condition from the theory of information flow (Barwise and Seligman [T]). 
Composition of morphisms is defined component-wise. Let Tbl denote the cat- 
egory of tables (database relations) with the two projections (sch is called the 

schema functor) Sch 4—- Tbl op Cls and the key functor Tbl — ^ Set. 

3 Since the table tuple function embodies the entity/domain integrity constraints (Sec- 
tion [2}, this condition on morphisms asserts the preservation of data integrity. 
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K 2 



tup £2 (I 2 ,s 2 ) 



tup(h,f, g) 



k ■ t 2 = ti • (/»■(-)) ■ ((-)-g) and s 2 • / = h ■ si 
for all k± £ AT i .i 2 £ / 2 

lot fc 2 = k(k\) £ ATi, <i = h(i 2 ) £ Ii 
then t 2fc2 = /i • t ifel • g, f(s 2 (i 2 )) = si(ii) and t 2 fc 2 ,i 2 = sOlfe!,^) 



T(f,g)(l2, S 2 ) X \ tup £l (h) 

tup £l (I 2 ,S2-f) 



fup £l (/l,Sl) honcc *2fc 2 ,i 2 F=£ 2 S2(i 2 ) iff F=£i si(ii) 




A' 2 



r 2 


i 2 


| fe 2 
■ — 


*2fc 2 .i 2 



Ti 



^ fci 



fc ■ *2 = ti • tup(h, f, g) = ti • £up £l (/i) ■ s 2 ) 

This four-part figure illustrates the condition on table morphisms. It has been 
annotated to help guide the understanding. The condition is symbolically 
stated in terms of set functions in the line of text just above. The top left 
diagram illustrates the condition, and the bottom left diagram expands on 
this. The top right diagram text is more detailed in terms of a source key 
&i £ K\. Here we see appearance of the infomorphism condition 

\=e 2 32(^2) iff t lkljil \= £l f (82(12))- 

Finally, the bottom left figure illustrates the effect of the morphism on 
source/target tables 71 and Ti. 



Fig. 1. Table Morphism 
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(s 2 ,e 2 ,K 2 ,t 2 ) < <h,f ' a,fc> {s 1 ,e 1 ,K 1 ,t 1 ) 



Set op - 



key° 



TbP p - 



sell 



Sdsgn 
J 



Sch 



typ 



Cls (x 2 , Y 3 , l=£ 2 > 
typ 

Set x 2 L, 



(X 1 ,Y 1 , |= £l > 



(X 2 ,I 2 ,s 2 ) <h,/> > (Xi.Ji,*!) 



We can have three indexing categories for tables: classifications, schema or 
semidesignations. Each has their uses: classification indexing proves that the 
category of tables is complete (and the fibers help explain database fibers), 
semidesignation indexing proves that the category of tables is cocomplete, and 
schema indexing follows the true formal-semantics distinction. 



3.1 Classification Indexed Category. 

For any fixed classification £, the £ th -fiber category with respect to the classi- 
fication functor Tbl op -^s> Cls, called the category of ^-tables, is the comma 
category associated with the tuple functor tup £ : (Set4_A) op — > Set. 

Set Tbl(f) = (Set|i«p £ ) (Set|X) op . 

It has key and signature projection functors, an equivalent natural transfor- 
mation r : key £ ==> sign° £ v o tup £ , and is described as follows. A fiber object 
T 6 Tbl(£), or an £ -table (database £ -relation), is any table T £ Tbl with 
entity classification cls(T) = £ and tuple (content) function t : K — > tup £ (I, s) 
mapping each key (abstract tuple) to a (concrete) £ -tuple in the extent of (/, s). 
A fiber morphism in Tbl(£ ) is any table morphism (h, k) : T = (S, £, K, t) <^ 
(<S, £, K, t) = T in Tbl with identity infomorphism id £ = (idx, idy)- It consists 
of a signature morphism h : (/, s) — > (I, s) and a key function k : K — >• K, which 
satisfy the condition k ■ t = t ■ tup £ (h). 

Proposition 1. There is an indexed category of tables tbl : Cls op — > Cat from 
(the opposite of) the category of classifications and infomorphisms to the category 
of categories and functors. □ The ( opposite of the ) fibered category corresponding 
to this (its Grothendieck construction) is isomorphic to the category of tables 

with the classification functor projection Tbl op — Cls. 

Proposition 2. The category of £ -tables Tbl(£ ) = (Set^tup £ ) is complete, its 
key projection functor Tbl(£ ) — ^4 Set is continuous and its signature projec- 
tion functor Tbl(£ ) op — — (SetJ,A) is cocontinuous. 

4 The table indexing functor is the composition tbl — tup op o commao(-) op ; Cls op — > 
(Adj JJ. Set) op — > Cat — > Cat of a tuple functor Cls — ^» (Adj JJ. Set) and a comma 
category functor (Adj ij. Set) op comma y Cat. 
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Proof. (We have both an abstract and a useful concrete proof, but only have 
room for the former.) The category (Set4AT) op is complete, since (Set|X) is 
cocomplete. The tuple functor tup £ is continuous. El 

The category off-tables Tbl(£) is the semantical domain for a relational database 
V with entity classification £. Completeness of Tbl(£ ) means that, not just bi- 
nary, but database joins over arbitrary diagrams of tables of V can be computed. 

Proposition 3. For any infomorphism (/,<?) : £2 — > £1, the table fiber functor 
tbl/f t3 \ : Tbl(fi) — > Tbl(£2) is continuous. 

Continuity of tbl/f >g \ means that database joins are preserved: database joins of 
£ i-tables are mapped to database joins of £ 2-tables. 

Theorem 1. The category of tables Tbl is complete. 

Proof. The indexing category Cls is complete, the fiber category tbl(£) is com- 
plete for each classification £ , and the fiber functor tbl/f t3 \ : tbl(£\) — > tbl(£2) is 
continuous for each infomorphism (/, g) : £2 — > £%. Hence, this is an application 
of a theorem of Tarlecki, Burstall and Goguen [8] . H 

3.2 Schema Indexed Category. 

For any fixed (simple) schema S = {X, I, s), the >S -fiber category Tbl(5) with 

respect to the schema functor Tbl op Sch, called the category of 5-tables, 
is the comma category with key and A-classification projection functors 

Set Tbl(S) = (Setitup s ) ^> Cls(X) op . 

It is described as follows. A fiber object T G Tbl(5), or an 5-table (database 
5-relation) , is any table T G Tbl with (simple) schema sch(T) = S. A fiber 
morphism in Tbl(5) is any table morphism {g, k) : T — (S, £, K, t) ^— (S, £,K,t) = 
T in Tbl with identity (simple) schema morphism ids = (idx, idi). It consists 
of a universe (entity instance) function g : Y — > Y defining an entity info- 
morphism (lx,g) ■ £ = (X,Y,\=e) (X,Y,\=) = g^ 1 ^) = £ and hence 

5 If £ and B are complete, and both T : £ — ► C and S : B — > C are continuous 
functors, then the comma category (T 4- S) is also complete and the projection 
functors (T^S) — > £ and (T\.S) —¥ B are limit preserving. 

6 If C : I op — > Cat is an indexed category such that I is complete, d is complete for 
all indices i G I, and C CT : Q,- — >■ Cj is continuous for all index morphisms a : i — > j, 
then Gr(C) is complete. 

7 The tuple functor tup s : Cls(X)° p — > Set maps an X-classification £ — (X, Y, \=) 
to the tuple set tup s (Y,\=) — tup £ (I , S) and maps an X-infomorphism (lx,g) '■ 
£2 — (X, Y2, 1=2} <^ (X, Yi, |=i) = g~ 1 {£2) = £1 with instance function g : Yi — > Y2 
to the tuple function tup s (g) = T(j x , s ) (I, s) = (-)-g : tup s (Y\, |=i) = tup £l (I,S) — > 
tu Pe 2 ( I > s ) = tu Ps( Y 2,\=2)- 
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the presheaf morphism ((Set^X), tup £ ) <ix ' a> > ((Set,LX), tup^) with tuple 

natural transformation Tn x , g \ '■ tup^ => tup £ , and a key function k : K — > K, 
which satisfy the condition k ■ t = t ■ tup(g). H 

Proposition 4. There is an indexed category of tables tbl : Sch op — > Cat, 

whose Grothendieck construction (fibered category) is (the opposite of) the cate- 
gory of tables with the schema functor projection Tbl op Sch. 

3.3 Semidesignation Indexed Category. 

A semidesignation S = (I,s,£), consists of a schema (X,I,s), and an entity 
classification £ = (X,Y, \= £ ) with a common (entity) type set component X. A 
semidesignation morphism (h, f, g) : S 2 — > Si consists of a schema morphism 
(h,f) : S 2 Si and an entity infomorphism (/,<?) : £2 = (X2,Y2,\=s 2 ) ^ 
(Xi , Y\ , \=£ 1 ) — £1 with a common (entity) type function component / : X 2 — > 
X\. For any semidesignation S = (I,s,£), the set of tuples of S is tup(S) = 
tup £ (I, s), the set of f-tuples in the extent of (I, s). 

Lemma 1. Any semidesignation morphism (h,f,g) : S2 — > <Si defines a tuple 
function tup(h,f,g) : tup (Si) = tup £i (Ii,si) -> tup £2 (I 2 , s 2 ) = tup(S 2 )- 
Hence, there is a tuple functor tup : Sdsgn op — > Set . 

Proposition 5. The category of tables is the comma category 

Set Tbl = (SeUtup) ^ Sdsgn op ^ Cls op 

associated with the tuple functor tup : Sdsgn op — > Set . The category of tables 
is cocomplete. 

Proof. The opposite category of semidesignations Sdsgn op is cocomplete, since 
Sdsgn is complete. E| 



4 Relational Databases 

A relational database T> = (S,£,K,t) is a naturally connected diagram of 
tables. It has an underlying relational database schema S — (R, X : S) o with 

8 The components determining variance between T = (S, £, K, t) and T = (S, £, K, t) 
are the entity instance function (varying instances and their incidence or classifica- 
tion relations) and the key function (varying the set of keys and the tuple natural 
transformations) . 

9 If A and B axe cocomplete, T : A — > C is a cocontinuous functor, and S : B — ► C 
is any functor (not necessarily cocontinuous), then the comma category (T^.S) will 
also be cocomplete. 

10 A relational database schema S = (R, X, S) consists of a category of relation symbols 
R, a set of entity types X, and a signature functor S : R > (Set^X). Any rela- 
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X 



v' <e 



y |=£ x <£ x' 

extg (x) C extg (x ) 
•fx 2 fx' 



a category of relation types (symbols) R linked by generalization-specialization, 
a set of entity types X, and a signature functor S : R — > (Set4-X), an entity 
classification £ = (A, Y, \=s) with a common (entity) type set component X and 
a local universe of entity instances Y , a key functor K : R op — > Set, and a tuple 
natural transformation r : K =>• S op o tup £ . Equivalently, it consists of a table 
functor T : R op — > (Setltup £ ), where r = Tt£ and T£ : key £ sign £ p o tttp £ 
is the tuple natural transformation that is an integral component of the comma 
category (Setf.tup £ ). Here are some examples of relational databases. 

Table. A table (database relation) T = (S,£,K,t) with entity classification 
£ = (X,Y, \=e), schema S — (X,I,s), tuple set K, and tuple function t : 
K — > tup £ (I, s), is a one-object relational database with the same entity 
classification, the terminal category of relation types (symbols) 1 = {*}, the 

(I.s) 

signature functor with single A-signature 1 — : — > (Set^A), the key functor 

with single key set l op = 1 Set, the tuple natural transformation with 
single component tuple function t : K — > tup £ (I , s), and the table functor 

with single £ -table l op = 1 ^> (Setitup £ ). 
Classification. A classification £ = (X, Y, \= £ ) is a relational database db(£) — 
(>£,'[?,£, extg^rg), where the entity classification is itself (db o els = 
idcis)- l3 The additional components are described as follows. The category 
of relation types (symbols) is the reverse conceptual preorder (generalization- 
specialization order) on entity types (A, > £ ) with x 1 >£ x when ext£(x') 3 
ext£(x); that is, when x 1 is at least as general as x. The signature functor 

is the principle filter operator (A, >g) — > (Set4.A); it maps an entity type 
x E X to the A-signature ('fx, inc x ), where fx = {x' G A x' >£ x} is the 
principle filter of x and inc x : fx X is inclusion; and it maps an ordering 
x' >£ x with fx' C fx to the arity inclusion function inc x > iX : fx' <-t fx. For 
any an entity type x 6 A, the f-tuple functor applied to the A-signature 
(fx,inc x ) is the tuple set tup £ (fx, inc x ) = {fx — > Y \ t(x') ^£ x',\/x' >£ 
x}. The key functor is the extent operator (A, <g) ext£ > Set, which maps 
an entity type x 6 A to its extent ext£{x) — {y £ Y \ y (=£ x} and 
maps the ordering x <£ x' with ext£{x) C ext£{x') to the extent inclusion 
function ext£(x,x') : ext£(x) ^ ext£(x'). For any entity type x G A, the 



tional database schema <5> = (R, X, S) with colimit reference schema (I, s) = ]J S, 
defines a type language lang(S) in the Information Flow Framework [9] with ref- 
erence component (X,I,s) and signature component (R,9). We regard the colimit 
X-signature (J, s) — JJ S to be a reference schema (X, 7, s) with reference (sort) 
function I A X from a universal set of variables / to the type set X . For any rela- 
tion symbol r € R, the colimit injection S(r) — (J r ,s r ) -A (I, s), whose condition 
s r = b r ■ s expresses the s-alignment of s r via t r , states that the signature (I r ,s r ) 
is below (at least as general as) the colimit signature (7, s) . The signature functor d 
factors S — d o inc : R — > (Setj,X) through sign(I, s) C (SetJ-A), the subcategory 
of X-signatures below (I,s). 

Since any preorder V = (P,<) is a classification "P = (P,P,<), a preorder is a 
relational database. 
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tuple function Ts(x) : exts(x) — > tup^^x, inc x ) maps an entity instance 
y € exte{x) to the constant tuple \x — > Y. This defines a natural transfor- 
mation T£ : ext£ =>• (tf:) op tup £ . 

IFF Structure. Using the key functor, we can define the relation classification 
1Z = (R,K, ^n) with type set R = obj'(R), instance set K = \J reR K(r), 
and incidence with k (=tj r when k 6 K (r) for key k e K = UreR K( r ) an d 
relation symbol r € R = obj(H). The elements (keys) in K are called ab- 
stract tuples in the Information Flow Framework [9] . Any relational database 
T> = (S, £, K, t) determines an structure (model) struc(D) in the Informa- 
tion Flow Framework [9] with type language lang(S), entity classification 
£, semidesignation (I,s,£) and relation classification 1Z. This is an adjoint 
situation: any IFF structure determines a relational database. 

Unified Database. A unified database is a special case of a database, whose 
relation classification coincides with its entity classification 1Z = £. Unified 
databases allow the introduction of foreign keys. In fact, columns are either 
the single primary key or a foreign key. The entries in a column are keys 
of the type of the column. Actual datatypes, such as strings or numbers, 
can be regarded as primary keys of themselves. Conversely, we can think of 
any relational table with a single column, one whose schema is of the form 
1 A- X, to be a set of entities. 

Any relational database schema S = (R, X, S) in unified form (R = obj(R) = 
X) has an associated sketch. Define the arity functor A = S o setx ■ R — > 
(Set^X) — > Set, Let J A ^—^ R denote the Grothendieck construction of A 
with object set Ureii A(r) = {(r, i) \ r € R,i £ I, (I, s) = S(r)} and morphisms 
(r',i') A (r, i) for R-constraints r' A r. The graph gph{S) of the sketch has 

node set R and edges (r, i) G J A with source and target r ^ r ' v ) s(i). This graph 
is actually 2-dimensional, given the R-constraints. The sketch specifies a cone 
for the signature of each relation type reR and constraints for the commuting 
diagrams in R. Any relational database T> = (S,£,K,t) in unified form has 
an associated sketch interpretation gph(S) op — > Set. The interpretation maps a 

node (relation type) r £ R to K (r) the set of keys of r and maps an edge r ^ > 
s(i) to the map K(r) — > K(s(i)) : k M> T r (k)(i), where r r (k) £ tup £ (I,s). This 
also is 2-dimensional. 

A relational database morphism (F,9,f,g,K) : T>2 — (S2, £2, -^2,72) — > 
(Si, £ 1, K\ : ti) = Pi consists of a relational database schema morphism (F, 6, f) : 
S 2 -> Si an entity infomorphism (/,#} : £ 2 = (X 2 ,Y 2 , ^ £ . 2 ) ^ (Xi,Yi, |= £l ) = 
£\ with a common (entity) type function component / : X2 — > X\ and a uni- 
verse (entity instance) function g : X\ — > X 2 , and a key natural transformation 

12 A relational database schema morphism (F,8,f) : S2 = (R.2, X2, S2) — > 
(R,i,Xi,Si) = <Si consists of a relation functor F : R2 — > Ri, a function on en- 
tity types / : X2 — >• Xi and a signature natural transformation 6 : S2 ° E/ =^ F o Si. 
Any (strict) relational database schema morphism {F,f) : S2 — > Si determines a 
type language morphism lang(F, f) : lang(S2) — > lang(Si) in the Information Flow 
Framework [9], since we have the commutative diagram F o di = 820 sign(\\ F, /). 



Database Semantics 



11 



k : F op o K\ K2, which satisfy the condition n • r 2 = -F op Ti • 0?5 v, where 



database schema morphism is strict or trim (9 = 1). Figure [2] illustrates in detail 
a relational database morphism. Here are some examples of relational database 
morphisms. 

Table morphism. A relational database morphism (F, 6, /, g, k) with one-object 
source and target categories of relations and identity relation functor F = 
idx ' 1 — > 1 is identical to a single morphism of tables (k, h, f, g) : (K\ , t\, Ji, Si, £ 1) — > 
(A 2 , t2, 82, £2)1 except that the direction has switched. 

Infomorphism. An entity infomorphism (/, g) : £ 2 = (X 2 ,Y 2 , |=£ 2 ) <^ (X\,Y\, 1=^} = 
£1 is a relational database morphism between the classifications regarded 
as relational databases, where the following hold. The type function is 
monotonic / : (X2, >£ 2 ) (Ai, >Si) mapping an ordering x' 2 >£ 2 X2 with 
ext£ 2 (x' 2 ) 3 ext£ 2 (x2) to the ordering f(x' 2 ) >£ 1 /(x2),with ext£ 1 (f(x 2 )) 12 
ext £l (f(x 2 )), since y x \= £l f(x 2 ) iff g(yi) \=£ 2 x 2 implies g(y 1 ) \=£ 2 x' 2 iff 

Ui p£i f( x 2)- For each type X2 & A 2 , the type function A 2 — > X\ restricts to 
an arity function (x 2 ), inc x . 2 -f) A (t £l (/(x 2 )), inc^ X2 ^}. This is the x 2 h - 
component function of a signature natural transformation 0:t£ 2 o 2/=^/ 
t £l ■ For each type x 2 6 A 2 , the instance function Y\ A Y 2 restricts to a func- 
tion extgj (f(x 2 )) A ea;^ 2 (x 2 ), since an instance ?/! 6 satisfying ?/! |=g 1 
/(x 2 ) determines the instance g{y\) € Y 2 satisfying g(yi) ^£ 2 x 2 - This is the 

/ op 

x\ -component function of a key natural transformation g : (A 2 , <£ 2 ) > 

(Ai,< £l ) h Set (A, <£ 2 ) h Set. Finally, for any entity type 

x 2 6 A 2 , the two functions extg 1 (f(x 2 )) A exts 2 (x 2 ) — > tup £2 (-\x 2 ,inc X2 ) 

and ext £l (f(x 2 )) A £up £l (1/(2:2), inc f(x2) ) tup El (-\x 2 , inc X2 -f) 

tup e2 ("[x 2 , inc X2 ) are equal. 
IFF Structure Morphism. Using the key natural transformation k : F° p o 
K\ =>■ AT 2 , we can define the relation infomorphism (F, A) : 7tL 2 = {R2, K 2 , \=n 2 ) 

Ai, H^i) = w hh type function A = obj(F) : A 2 — s> i?i and instance 
function A" : Ai — > A 2 : k\ 1— > K r2 (fci) using the r^ 11 component function 
«V 2 : Ki(ri) — > K 2 (r 2 ) for each key k\ £ Ki(ri) of an image relation type 
r\ = F(r 2 ) (A is defined by arbitrary choice, otherwise). Any strict relational 
database morphism (F,f,g,n) : X> 2 = (5 2 , £ 2 , AT 2 , t 2 ) — > (Si, £1, Jfj, ri) = 
T>i determines an structure (model) morphism struc(F, /, g, k) : struc(S 2 ) — > 
struc(Si) in the Information Flow Framework [§] with type language mor- 
phism lang(F, f) : Zang(5 2 ) — > lang(Si), entity infomorphism (/, g) : £2 = 

13 r (/ s> ^ s ^ ne tuple natural transformation in the morphism of presheaves tup(f,g) = 
(E/,/*,T(/, 9 )> : {(SetjJG), tup £2 ) <=* ((Set^Xi), fup £l ) coming from the tuple 
functor £up : Cls — >■ (Adj-U-Set). 

14 Since any pair of adjoint monotonic functions (f,g) : (P 2 ,< 2 ) ^ (Pi,<i) is an 
infomorphism (/, g) : {P 2 , P 2 ,< 2 ) ^ {Pi, Pi, such a pair is a relational database 
morphism. 




= #°pf UP£l .s 2 op T (/!;?) .El 



It is strict or trim when the underlying relational 
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(X 2 , Y 2 , h£ 2 ) ^ |=£i> = £1, semidesignation morphism {]} F, f,g) : 

(I 2 , s 2 ,£ 2 ) —> (Ii,si,£i) and relation infomorphism {F,K) : 1Z 2 ^ Hi- 

Composition of morphisms is defined component-wise. Let Db denote the cate- 
gory of relational databases with the two projections (dbs is called the schema 

functor) Dbs Db Cls and the key functor Db — ^> (Catjj-Set) map- 
ping V to (R, K) and (F,6,f,g,K) : V 2 = (S 2 , £ 2 , K 2 , r 2 ) (5i, £1, Xi, n) = 
2?x to (*» : (R 2 ,K 2 ) -> (Ri, Jfi). 



(F,e,f,g,K) 

(S 2 ,£ 2 ,K 2 ,t 2 ) '> (S 1 ,£ 1 ,K 1 ,r 1 ) 

key cls 

iU2 , K2) lE^ {Rl , Kl) (Cat^Set) Db 



R 2 — ► Ri 



top 

Cat 



sch 



rel 



Dbs 



SDsgn 
J 

typ 



Cls <x 2 ,r 2 , h£ 2 > ftJi, N £l > 

Set x 2 -L, x. 



{R 2 , X 2 , S 2 ) — — - — '--^> (R 1 ,X 1 ,S 1 ) 



Proposition 6. There is a diagram functor Db g "> (Cat JJ-Tbl) from databases 
to (the lax comma category of) diagrams of tables. 

Recall that the limit operation is a functor (Cat J| Tbl) op Tbl. 
Definition 1. The join functor is defined to be the composition 
join = dgm op o lim : Db op -»■ Tbl. 

Corollary 1. The schema of the join of a database is the reference (colimit) of 
the underlying database schema. 

(Cat ij. Tbl) 



Db 



sch Tbl op 



dbs 



dgm 



(Cat fr Sch) 



colim 



sch 



Dbs 



refer 



Sch 



In any complete category, the limits of arbirary diagrams can be constructed by 
using only the terminal object and (binary) pullbacks. Dually, in any cocomplete 
category, the colimits of arbirary diagrams can be constructed by using only the 
initial object and (binary) pushouts. As we have shown, for any entity classifi- 
cation £ — (X, Y, \=s), the category of f-tables Cat(£) is complete. Hence, for 
any database schema S the join of arbitrary 5-databases can be constructed by 
using only the join of the empty database (the terminal £-table) and the join 
of ^-databases with binary span X-schemas (two £ -tables connected through a 
third). 
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pop 




Set 



id 



Set 



</,3> 



_F op o tup 1 (R 1 .S 1 ) 

F a "T-t , 

F op o K ± =>- F op o S op o tup £l 

^6 op tu PSl 

tu Pi K Jj S 2 P ° (E/)° p ° tup £l \j,'tu P{f g) (F,e) 

J| S 2 Pr </,9> 

K 2 ==r- S° p o tup £2 
T2 ' 

tap 2 (R 2 ,S 2 ) 



pop 



T 2 (r 2 ) 

Jf2(l-2) tUp £2 (I 2 ,S 2 ) 



*2(P2) 



tup £ (h 2 ) 



K { r ' 2 ) *"P£ 2 ( 7 2. S 2) K 2 

S 2 (r' 2 ) S i(Pi) S2 (r 2 ) 



(l' 2 ,s' 2 ) 



2— > <-T2,s 2 ) 



J2f(h,s 2 ) = (I 2 ,s 2 -f) 

«r 2 (-) 

tUp £l (Il,S!) > tUp El (I 2 ,S 2 -f) 



T 2 (r 2 ) 




Set 



k • r 2 = F op ri • S op tep £l . S° p t Ui9 



Jfi(ri) — ^ if 2 (r 2 ) > tup £2 (I 2 , S2 ) = Jf(ri) > t«p £l (/i, si) 



(up £l (9r 2 ) 
9r 2 '(-) 



ri(ri) 

^l(Pl) * tMPg^-fl.Sl) 



«l(Pl) 



ifi Jfi (r[ ) t«p e (/( , si ) 

ti (K ) 

ft(ri) S ^ Si(n) 



> tup £l (I 2 ,s 2 -f) 



r U<g) {I 2 ,s 2 ) 

(-)-s 



> tup £2 (I 2 , s 2 ) 



This figure illustrates the condition on relational database morphisms. It 
has been annotated to help guide the understanding. The condition is sym- 
bolically stated in the two lines of text just above. The top line states the 
condition in terms of natural transformations. The bottom line states the 
condition in terms of set functions on the r 2 h component for some source 
relation type r 2 (E R.2 ■ The large diagram in the center illustrates the con- 
dition. The two upper diagrams give alternate views of this. The top right 
diagram is in a form very much like a table morphism. This is appropriate, 
since a relational database morphism between single table databases is just 
a table morphism. Finally, we have illustrated the effect of the morphism 
on the source/target tables, starting with a source relational constraint 
(morphism) r' 2 r 2 . 



Fig. 2. Relational Database Morphism 
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5 Summary and Future Work 

We have define the semantics for the relational database logical environment, 
which can be used to specify database system consequence. This provides inter- 
pretations for various formalisms such as relational algebra and first order logic, 
where terms and equations can be included by replacing signature morphisms 
with (possibly quotiented) term-tuples. The two most important acheivements 
of this paper are the definition of a natural and general category of tables that 
is both complete and cocomplete, and the definition of a morphism of databases 
with some very nice properties. We have extended the notion of tables (Spi- 
vak 0), first from an underlying entity type specification to an entity classifi- 
cation (models multi-inheritance) , second from the static case of an underlying 
entity classification to the dynamic case of tables moving along an underlying 
entity infomorphism. We have proven completeness and cocompleteness for this 
(larger) category of tables. Completeness allows joins over arbitrary collections 
of tables that are possibly linked by projections. This includes selection, which 
is the join with respect to reference relations (tables) . Cocompleteness allows a 
distributed union that is new. 

However, much work needs to be done. We need to investigate further prop- 
erties of database morphisms, including continuity. In a follow-up paper we will 
develop various formalisms, such as relational algebra and first order logic, and 
define views and queries. This will deepen the connection with the Information 
Flow Framework. Functional dependencies and normal forms should be expressed 
in terms of the categorical structure. For practical database maintenance, mod- 
ifications (insertion, deletion and update) need to be defined. The unified form 
(plus its graphical representation) needs further development. And finally, the 
theory of databases defined in this paper should be more closely compared and 
contrasted with other approachs, such as the simplicial database approach (Spi- 
vak [6], [7]) and the sketch approach (Johnson, Rosebrugh et al [4]). 
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